Smoke: Fine-grained Lineage at Interactive Speed

نویسندگان

  • Fotis Psallidas
  • Eugene Wu
چکیده

Data lineage describes the relationship between individual input and output data items of a workflow, and has served as an integral ingredient for both traditional (e.g., debugging, auditing, data integration, and security) and emergent (e.g., interactive visualizations, iterative analytics, explanations, and cleaning) applications. The core, long-standing problem that lineage systems need to address—and the main focus of this paper—is to capture the relationships between input and output data items across a workflow with the goal to streamline queries over lineage. Unfortunately, current lineage systems either incur high lineage capture overheads, or lineage query processing costs, or both. As a result, applications, that in principle can express their logic declaratively in lineage terms, resort to hand-tuned implementations. To this end, we introduce Smoke, an in-memory database engine that neither lineage capture overhead nor lineage query processing needs to be compromised. To do so, Smoke introduces tight integration of the lineage capture logic into physical database operators; efficient, write-optimized lineage representations for storage; and optimizations when future lineage queries are known up-front. Our experiments on microbenchmarks and realistic workloads show that Smoke reduces the lineage capture overhead and streamlines lineage queries by multiple orders of magnitude compared to state-of-the-art alternatives. Our experiments on real-world applications highlight that Smoke can meet the latency requirements of interactive visualizations (e.g., <150ms) and outperform hand-written implementations of data profiling primitives.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring the Use of the Teaching Dimensions Observation Protocol to Develop Fine‐grained Measures of Interactive Teaching in Undergraduate Science Classrooms

to Develop Fine-grained Measures of Interactive Teaching in Undergraduate Science Classrooms (WCER Working Paper 2013-6). Retrieved from University of Wisconsin–Madison, Wisconsin Center for Education Research website: http://www.wcer.wisc.edu/publications/workingPapers/papers.php Exploring the Use of the Teaching Dimensions Observation Protocol to Develop Fine‐grained Measures of Interactive T...

متن کامل

Ultra-Fine Grained Dual-Phase Steels

This paper provides an overview on obtaining low-carbon ultra-fine grained dual-phase steels through rapid intercritical annealing of cold-rolled sheet as improved materials for automotive applications. A laboratory processing route was designed that involves cold-rolling of a tempered martensite structure followed by a second tempering step to produce a fine grained aggregate of ferrite and ca...

متن کامل

Striping for Interactive Video: Is It Worth It?

We study the design of interactive video servers that store videos on disk arrays. In order to avoid the hot–spot problem in video servers it is conventional wisdom to stripe the videos over the disk array using Fine Grained Striping or Coarse Grained Striping techniques. Striping, however, increases the seek and rotational overhead, thereby reducing the throughput of the disk array. Our result...

متن کامل

Supporting Fine-grained Data Lineage in a Database Visualization Environment

The lineage of a datum records its processing history. Because such information can be used to trace the source of anomalies and errors in processed data sets, it is valuable to users for a variety of applications including investigation of anomalies and debugging. Traditional data lineage approaches rely on metadata. However, metadata does not scale well to fine-grained lineage, especially in ...

متن کامل

Fine-grained device management in an interactive media server

The use of interactive media has already gained considerable popularity. Interactivity gives viewers VCR controls like slow-motion, pause, fast-forward, and instant replay. However, traditional server-based or client-based approaches for supporting interactivity either consume too much network bandwidth or require large client buffering; and hence they are economically unattractive. In this pap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2018